Automatic structuring and retrieval of large text files
نویسندگان
چکیده
منابع مشابه
Automatic structuring of text files 1
SUMMARY In many practical information retrieval situations, it is necessary to process heterogeneous text databases that vary greatly in scope and coverage, and deal with many different subjects. In such an environment it is important to provide flexible access to individual text pieces, and to structure the collection so that related text elements are identified and appropriately linked. Metho...
متن کاملAutomatic Text Decomposition and Structuring
Sophisticated text similarity measurements are used to determine relationships between natural-language texts and text segments. The resulting linked hypertext maps are used to identify different text types and text structures, leading to improved text access and utilization. Examples of text decomposition are given for expository and non-expository texts. The vector processing model of retriev...
متن کاملEconomical Inversion of Large Text Files
To provide keyword-based access to a large text file it is usually necessary to invert the file and create an inverted index that storeso for each word in the file, the paragraph or sentence numbers in which that word occurs. Inverting alarge file using traditional techniques may take as much temporary disk space as is occupied by the file itself, and consume a great deal of cpu time. Here we d...
متن کاملAutomatic Text Categorization and Its Application to Text Retrieval
ÐWe develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the effectiveness of our categorization approach using two realworld document collections...
متن کاملImproving the Automatic Retrieval of Text Documents
This paper reports on a statistical stemming algorithm based on link analysis. Considering that a word is formed by a prefix (stem) and a suffix, the key idea is that the interlinked prefixes and suffixes form a community of sub-strings. Thus, discovering these communities means searching for the best word splits that give the best word stems. The algorithm has been used in our participation in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Communications of the ACM
سال: 1994
ISSN: 0001-0782,1557-7317
DOI: 10.1145/175235.175243